Skip to content

Conversation

@petercad
Copy link

Currently make_block_2d_copy_C always selects a block 2D store, but we really need separate APIs for C (loads) and D (stores). The default value type can also be different, in case of MMA atoms with different C/D types.

This PR introduces C/D APIs. Some APIs are common between C/D, and are named make_block_2d_copy_CD to avoid duplication.

@tdeng5 tdeng5 added the urgent PR requires a urgent attention (for release or blocking another PR) label Oct 22, 2025
rolandschulz pushed a commit that referenced this pull request Oct 22, 2025
Modify 00_bmg_gemm to include new mma and copy atoms
(#477).
00_bmg_gemm combines two parts: mma and epilogue. To add new atom
changes, we need to update both parts since they currently use old
atoms. As starting we will:

> Keep CollectiveEpilogue unchanged for now
> Only modify CollectiveMma first

Old Atom:

Problem Size: 5120x4096x4096x1
Cutlass GEMM Performance:     [96.448]TFlop/s  (1.7813)ms

New Atom:

Problem Size: 5120x4096x4096x1
Cutlass GEMM Performance: [97.259]TFlop/s (1.7664)ms

Also depend on new copy_c/copy_d apis for load/store
#572

---------

Co-authored-by: Anamika Chatterjee <[email protected]>
@rolandschulz rolandschulz enabled auto-merge (squash) October 22, 2025 23:08
Copy link

@sanchitintel sanchitintel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#573 reported some performance regressions when this API is used instead of manually selecting copy atoms (or compared to the legacy API) .

But performance in examples/cute/tutorial/xe_gemm.cpp example wasn't adversely affected by these changes, so the performance degradation observed in #573 is likely to be unrelated to this PR (except if the perf degradation is due to loading the canonical C matrix in #573, but if beta is 0, would the compiler still load C? Not sure about that part).

@rolandschulz rolandschulz merged commit 33cae04 into main Oct 23, 2025
2 of 11 checks passed
@sanchitintel
Copy link

#540's commit is newer than the base commit of this PR's branch. #540 had some changes that conflict with this one, so CI is failing.

rolandschulz pushed a commit that referenced this pull request Oct 23, 2025
#540 landed first.
#572 had an older base commit than #540's commit when it was merged.
Resolving conflict
@sanchitintel

This comment was marked as resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

urgent PR requires a urgent attention (for release or blocking another PR)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants